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Abstract 

We present a new adaptive Monte Carlo integration algorithm for 
ill-behaved integrands with non-factorizable singularities. The algo- 
rithm combines Vegas with multi channel sampling and performs sig- 
nificantly better than Vegas for a large class of integrals appearing in 
physics. 



1 Introduction 

Throughout physics, it is frequently necessary to evaluate the integral /(/) 
of a function / on a manifold M using a measure /i 

/(/)= / d/i(p)/(p). (1) 
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More often than not, an analytical evaluation in terms of elementary or 
known special functions is impossible and we have to rely on numerical meth- 
ods for estimating /(/). A typical example is given by the integration of 
differential cross sections on a part of phase space to obtain predictions for 
event rates in scattering experiments. 

In more than three dimensions, standard quadrature formulae are not 
practical and Monte Carlo integration is the only option. As is well known, 
/(/) is estimated by 



where g is the probability density (with respect to the measure n) of the 
randomly distributed Pi, e.g. g{p) = 1/ Vol(M) for uniformly distributed pj. 
The error of this estimate is given by the square root of the variance 



which suggests to choose a g that minimizes V{f). If / is a wildly fluctuating 
function, this optimization of g is indispensable for obtaining a useful accu- 
racy. Typical causes for large fluctuations are integrable singularities of / 
or /i inside of M or non-integrable singularities very close to M. Therefore, 
we will use the term "singularity" for those parts of M in which there are 
large fluctuations in / or /x. 

Manual optimization of g is often too time consuming, in particular if 
the dependence of the integral on external parameters (in the integrand and 
in the boundaries) is to be studied. Adaptive numerical approaches are 
more attractive in these cases. The problem of optimizing g numerically has 
been solved for factorizable distributions g and measures /i by the classic 
Vegas [0] algorithm long ago. Factorizable g and /i are special, because the 
computational costs for optimization rise only linearly with the number of 
dimensions. In all other cases, there is a prohibitive exponential rise of the 
computational costs with the number of dimensions. 

The property of factorization depends on the coordinate system, of course. 
Consider, for example, the functions 








(4a) 
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f2{Xi,X2) = -2 (4b) 

on M = (—1,1) (S>( — 1,1) with the measure d/i = dxi A dx2- Obviously, 
/i is factorizable in Cartesian coordinates, while /2 is factorizable in polar 
coordinates. Vegas will sample either function efficiently for arbitrary 61^2 
in suitable coordinate systems, but there is no coordinate system in which 
Vegas can sample the sum fi + /2 efficiently for small 61^2- 

In this note, we present a generalization of the Vegas algorithm from 
factorizable distributions to sums of factorizable distributions, where each 
term may be factorizable in a different coordinate system. This larger class 
includes most of the integrands appearing in particle physics and empirical 
studies have shown a dramatic increase of accuracy for typical integrals. 
Technically, this generalization is the combination of the Vegas algorithm 
with adaptive multi channel sampling 0. 

In section H, we will discuss the coordinate transformations employed by 
the algorithm and in section |], we will describe the adaptive multi channel 
algorithm. Finally, I will discuss the performance of a ffist implementation 
of the algorithm in section ^ and conclude. 

2 Maps 

The problem of estimating /(/) can be divided naturally into two parts: 
parametrization of M and sampling of the function /. While the estimate 
will not depend on the parametrization, the error will. 

In general, we need an atlas with more that one chart (j) to cover the 
manifold M. We can ignore this technical complication in the following, 
because, for the purpose of integration, we can decompose M such that each 
piece is covered by a single chart. Moreover, a single chart suffices in most 
cases of practical interest, since we are at liberty to remove sets of measure 
zero from M. For example, after removing a single point, the unit sphere 
can be covered by a single chart. 

Nevertheless, even if we are not concerned with the global properties 
of M that require the use of more than one chart, the language of differential 
geometry will allow us to use our geometrical intuition. Instead of pasting 
together locally flat pieces, we will paste together factorizable pieces, which 
can be overlapping, because integration is an additive operation. 
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For actual computations, it is convenient to use the same domain for the 
charts of all manifolds. The obvious choice for n-dimensional manifolds is 
the open ?7,-dimensional unit hypercube 



f/ = (0,l) 



(5) 



Sometimes, it will be instructive to view the chart as a composition (p = ipox 
with an irregularly shaped P e R" as an intermediate step 




(6) 



f otP 



(in all commutative diagrams, solid arrows are reserved for bijections and 
dotted arrows are used for other morphisms). The integral (P can now be 
written 



/(/) 



d"x 



dx 



(7) 




and it remains to sample \d(f)/dx\- {f ocj)) on U . Below, it will be crucial that 
there is more than one way to map U onto M 



(8) 



and that we are free to select the map most suitable for our purposes. 

The ideal choice for would be a solution of the partial differential equa- 
tion \d(j)/dx\ = l/(/ o 0), but this is equivalent to an analytical evaluation 
of /(/) and is impossible for the cases under consideration. A more realistic 
goal is to find a such that \d(j)/dx\ ■ (/ o 0) has factorizable singularities 
and is therefore sampled well by Vegas. This is still a non-trivial problem, 
however. 

For example, consider the phase space integration for gluon radiation 
e~^e~ — >■ qqg. From the Feynman diagrams in figure |l] it is obvious that the 
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(12 
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Figure 1: e+e qqg 



squared matrix element will have singularities in the variables Si/2 = {(I1/2 + 
ky. Thus, adaptive sampling using Vegas would benefit from a parametriza- 
tion using both si and S2 as coordinates in the intermediate space P. Un- 
fortunately, the invariant phase space measure for such a parametrization 
involves the Gram determinant in the form 1 / a/A4(pi,P2; Qi, <l2)i which will 
lead to non-factorizable singularities at the edges of phase space. Note that 
the very elegant phase space parametrizations of the RAMBO type are 
not useful in this case, because there is no simple relation between the coordi- 
nates on U and the invariants in which the squared matrix elements can have 
singularities. On the other hand, it is straightforward to find parametriza- 
tions that factorize the dependency on si or S2 separately. 

Returning to the general case, consider different maps (pi : U ^ M 
and probabihty densities Qi : U [0, 00). Then the function 



i=l 



-1 



dp 



is a probability density (7 : M — > [0, 00) 



d/i(p) g{p) = 1 



M 



as long as the Qi and at are properly normalized 



j-l 

/ 5fi(x)d"x = 1 , = 1 , < ftj < 1 . 

•^0 i=i 



(9) 



(10) 



(11) 



From the definition (0), we have obviously 



Nc 



i=i 



dp 



9{P) 



(12) 
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and, after pulling back from M to U 



No .1 



(13) 



we find the estimate 



1=1 
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(14) 



9i 



The factorized gi in (|T2D and (|Tj) can be optimized using the classic Vegas 
algorithm [|[] unchanged. However, since we have to sample with a separate 
adaptive grid for each channel, a new implementation is required for 
technical reasons. 



Using the maps vTj 



: U ^ U introduced in 



we can write 



the g o (pi : U [0, oo) from ([T^) as 



g o 



dx 



Nc 



\ 



dx 



\ 



J 



(15) 



From a geometrical perspective, the maps are just the coordinate transfor- 
mations from the coordinate systems in which the other singularities factorize 
into the coordinate system in which the current singularity factorizes. 

Note that the integral in (12) does not change, when we use 0j : [/ — *• 
Mi D M, if we extent / from M to Mi by the definition f{Mi \ M) = 0. 
This is useful, for instance, when we want to cover (—1, 1) ® (—1, 1) by 
both Cartesian and polar coordinates. This causes, however, a problem with 



the 7ri2 in (|T^). In the diagram 



Pi 




-Ml- 



U 



M 



vri2 



i2 




(16) 



the injections ti^2 are not onto and since 7ri2 is not necessarily a bijection 
anymore, the Jacobian \dTiij/dx\ may be ill-defined. But since f{Mi\M) = 0, 



6 



we only need the unique bijections (p^ 2 cind 7r^2 that make the diagram 




commute. 

In many apphcations, the dependence of an integral on external param- 
eters has to be studied. Often, the vTjj will not depend on these parameters 
and we can rely on Vegas to optimize the Qi for each parameter set. In the 
next section, we will show how to optimize the Ui numerically as well. 



3 Multichannel 

Up to now, we have not specified the cti, they are only subject to the condi- 
tions (p^. Intuitively, we expect the best results when the are proportional 
to the contribution of their corresponding singularities to the integral. The 
option of tuning the ctj manually is not attractive if the optimal values depend 
on varying external parameters. Instead, we use a numerical procedure 0] 
for tuning the a,. 

We want to minimize the variance (|^) with respect to the ttj. This is 
equivalent to minimizing 

^^^'"^^//^^^'^'"^^^ ^W)) ^^^^ 

with respect to a with the subsidiary condition 'Y^^oti = 1. After adding a 
Lagrange multiplier, the stationary points of the variation are given by the 
solutions to the equations 

Wt:W,{f,a) = W{f,a) (19) 

where 

M,(/,„).-^P,(/.«)-/'..Wd».(f|||)^ (20) 

and 

Wif,a) = Y,c^^W^if,a). (21) 

1=1 
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It can easily be shown 0] that the stationary points (^) correspond to local 
minima. If we use 



Ni = aiN (22) 

to distribute sampling points among the channels, the Wi{f,a) are just 
the contributions from channel i to the total variance. Thus we recover the 
familiar result from stratified sampling, that the overall variance is minimized 
by spreading the variance evenly among channels. 

The Wi{f,a) can be estimated with very little extra effort while sam- 
pling /(/) (ct. 0) 



W,")= . (23) 




Note that the factor of Qi/ g from the corresponding formula in is absent 
from (^3D, because we are already sampling with the weight Qi in each channel 
separately. 

The equations ([T9|) are a fixed point of the prescription 



= — jTTT^^ — — g, [p > 0) (24) 



for updating the weights ctj. There is no guarantee that this fixed point 
will be reached from a particular starting value, such as ctj = 1/Nc, through 
successive applications of (|2^). Nevertheless, it is clear that ( P^ will concen- 
trate on the channels with large contributions to the variance, as suggested 
by stratified sampling. Furthermore, empirical studies show that ( ^4|) is suc- 
cessful in practical applications. The value /3 = 1/2 has been proposed in ||2[, 
but it can be beneficial in some cases to use smaller values like j3 = 1/4 to 
dampen statistical fluctuations. 



4 Performance 

Both the implementation and the practical use of the algorithm proposed 
in this note are more involved than the application of the original Vegas 
algorithm. Therefore it is necessary to investigate whether the additional 
effort pays off in terms of better performance. 
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A test version of an implementation of this algorithm, "VAMP", in For- 
tran IP has been used for empirical studies. This implementation features 
other improvements over "Vegas Classic" — most notably system independent 
and portable support for parallel processing and support for unweighted event 
generation — and will be published when the documentation is finalized. 
The preliminary version is available from the author upon request. 

4.1 Costs 

There are two main sources of additional computational costs: at each sam- 
pling point the function goip^ has be evaluated, which requires the computa- 
tion of the iVc — 1 maps Tiij together with their Jacobians and of the N(. — l 
probability distributions gi of the other Vegas grids (cf. (|15D). 

The retrieval of the current giS requires a bisection search in each dimen- 
sion, i.e. a total of 0{{Nc — 1) ■ ridim ■ log2(ndiv)) executions of the inner loop 
of the search. For simple integrands, this can indeed be a few times more 
costly than the evaluation of the integrand itself. 

The computation of the Tiij can be costly as well. However, unlike the gi, 
this computation can usually be tuned manually. This can be worth the effort 
if many estimations of similar integrals are to be performed. Empirically, 
straightforward implementations of the vTjj add costs of the same order as 
the evaluation of the gi. 

Finally, additional iterations are needed for adapting the weights of the 
multi channel algorithm described in (^. Their cost is negligible, however, 
because they are usually performed with far fewer sampling points than the 
final iterations. 



4.2 Gains 

Even in cases in which the evaluation of gi increases computation costs by a 
whole order of magnitude, any reduction of the error by more than a factor 
of 4 will make the multi channel algorithm economical. In fact, it is easy 
to construct examples in which the error will be reduced by more than two 
orders of magnitude. The function 

144atan(l/26) V^i((r3 - 1/2)2 + 52) + ^^((ra - 1/2)2 + ^2) 
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Figure 2: Comparison of the sampling error for the integral of / in (|25|) as 
a function of the width parameter b for the two algorithms at comparable 
computational costs. 

6(-l < Xi,X2,Xs < 1) 

xj + b^ 

with r2 = a/x^ + x| and = x\^ x\^ is constructed such that it can 
easily be normalized 

y d^x/(x) = l (26) 

and allows a check of the result. The three terms factorize in spherical, cylin- 
drical and Cartesian coordinates, respectively, suggesting a three channel 
approach. After five steps of weight optimization consisting four iterations 
of 10^ samples, we have performed three iterations of 10^ samples with the 
VAMP multi channel algorithm. Empirically, we found that we can perform 
four iterations of 5 ■ 10^ samples and three iterations of 5 ■ 10^ samples with 
the class Vegas algorithm during the same time period. Since the functional 
form of / is almost as simple as the coordinate transformation, the fivefold 
increase of computational cost is hardly surprising. 

In figure |^, we compare the error estimates derived by the classic Vegas 
algorithm and by the three channel VAMP algorithm. As one would expect, 
the multi channel algorithm does not offer any substantial advantages for 
smooth functions (i.e. b > 0.01). Instead, it is penalized by the higher 
computational costs. On the other hand, the accuracy of the classic Vegas 
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algorithm deteriorates like a power with smaller values of b. At the same 
time, the multi channel algorithm can adapt itself to the steeper functions, 
leading to a much slower loss of precision. 

The function / in ( pSf ) has been constructed as a showcase for the multi 
channel algorithm, of course. Nevertheless, more complicated realistic ex- 
amples from particle physics appear to gain about an order of magnitude in 
accuracy. Furthermore, the new algorithm allows unweighted event genera- 
tion. This is hardly ever possible with the original Vegas implementation, 
because the remaining fluctuations typically reduce the average weight to 
very small numbers. 



4.3 A Cheaper Alternative 



There is an alternative approach that avoids the evaluation of the QiS, sacri- 
ficing flexibility. Fixing the gi at unity, we have for ^ : M ^ [0, oo) 



E 

i=l 



Oli 



dp 



(27) 



and the integral becomes 



a,; 



i=l 







L 


dp 



9{P) 



i=l 







(28) 



Vegas can now be used to perform adaptive integrations of 

/.(/) = 



g{(p^{x)) 



(29) 



individually. In some cases it is possible to construct a set of (pi such that /«(/) 
can estimated efficiently. The optimization of the weights ai can again be 
effected by the multi channel algorithm described in (^. 

The disadvantage of this approach is that the optimal 0i will depend sen- 
sitively on external parameters and the integration limits. In the approach 
based on the g in (^ Vegas can take care of the integration limits automat- 
ically. 



11 



5 Conclusions 



We have presented an algorithm for adaptive Monte Carlo integration of func- 
tions with non-factorizable singularities. The algorithm shows a significantly 
better performance for many ill-behaved integrals than Vegas. 

The applications of this algorithm are not restricted to particle physics, 
but a particularly attractive application is provided by automated tools for 
the calculation of scattering cross sections. While these tools can currently 
calculate differential cross sections without manual intervention, the phase 
space integrations still require hand tuning of mappings for importance sam- 
pling for each parameter set. The present algorithm can overcome this prob- 
lem, since it requires to solve the geometrical problem of calculating the 
maps TTjj in ( ]T3| ) for all possible invariants only once. The selection and 
optimization of the channels can then be performed algorithmically. 

The application of the algorithms presented here to quasi Monte Carlo 
integration forms an interesting subject for future research. Other options 
include maps 0j depending on external parameters, which can be optimized 
as well. A simple example are rotations, which can align the coordinate 
systems with the singularities, using correlation matrices 0. 
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